The Mdl-principle in Testing Style Homogeneity between Literary Texts

نویسنده

  • Mikhail Malyutov
چکیده

We study a new context-free computationally simple stylometrybased attributor: the sliced conditional compression complexity (sCCC or simply CCC) of literary texts introduced in [8] and inspired by the incomputable Kolmogorov conditional complexity (KC). Other stylometry tools can occasionally almost coincide for different authors. Our CCCattributor is asymptotically strictly minimal for the true author, if the query texts are sufficiently large but much less than the training texts, universal compressor is good and sampling bias is avoided. This classifier simplifies the [13] homogeneity test (partly based on compression) under insignificant difference of unconditional complexities of training and query texts verifiable via its asymptotic normality [15] for IID and Markov sources and normal plots for real literary texts. It is consistent under large text approximation as a stationary ergodic sequence due to the lower bound for the minimax compression redundancy of piecewise stationary strings [11] (see also our elementary combinatorial arguments and simulation in [9] for IID sources). The CCC is based on the t-ratio measuring how many standard deviations are in the mean difference of slices’ CCC which enables evaluation of the corresponding P-value of statistical significance based on slices’ CCC asymptotic normality empirically verified by their normal plots in all cases studied and expected to be proved soon for simplified statistical models of literary texts. The asymptotic CCC study is complemented by many literary case studies (see [8,9, 10]).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semantic Variation in Idiolect and Sociolect: Corpus Linguistic Evidence from Literary Texts

Idiolects are person-dependent similarities in language use. They imply that texts by one author show more similarities in language use than texts between authors. Sociolects, on the other hand, are group-dependent similarities in language use. They imply that texts by a group of authors, for instance in terms of gender or time period, share more similarities within a group than between groups....

متن کامل

Importance and Position of Form and Writing Style in Philosophizing

Philosophical texts have emerged in diverse genres and literary forms. Thinkers take different stands about importance of elements of these works. Some of them consider ornamental and accidental role for literary forms and the others in contrast consider philosophical implication for these elements. Philosophers’ approach to the role of literary elements of philosophical text is influential on ...

متن کامل

Testing Problems in Russian as a Foreign Language in a Technical University

 Problems of theory and practice of the Russian as a foreign language testing for entrants in technical universities are considered. The benefits of test forms for controlling the foreign students’ skills in the Russian language during a hard time limit are presented. The structure and content of the tests, all types of tasks offered on the entrance and final examinations in the Russian languag...

متن کامل

The Effect of Authentic and Simplified Literary Texts on the Reading Comprehension of Iranian Advanced EFL Learners

The present quasi-experimental study mainly investigates the role of literature as input for reading comprehension in Iranian EFL classrooms. To be more exact, it investigates the effects of authentic and simplified literary texts on the reading comprehension of Iranian advanced EFL learners. The participants were 35 male and female Iranian EFL learners who were at advanced level, studying in a...

متن کامل

The Effect of Lexicon-based Debates on the Felicity of Lexical Equivalents in Translating Literary Texts by Iranian EFL Learners

This study was an attempt to investigate the effect of lexicon-based debates on the felicity of lexical equivalents in translating literary texts by Iranian EFL learners.  To fulfill the purpose of this study, 59 university students, majoring in English Translation, were randomly assigned to the experimental and control groups from a total of 73 students based on their performance on a mock TOE...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009